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Abstract 

This paper investigates the computational geometry relevant to calculations of the 
Frechet mean and variance for probability distributions on the phylogenetic tree space 
of Billera, Holmes and Vogtmann, using the theory of probability measures on spaces of 
nonpositive curvature developed by Sturm. We show that the combinatorics of geodesies 
with a specified fixed endpoint in tree space are determined by the location of the vary- 
ing endpoint in a certain polyhedral subdivision of tree space. The variance function 
associated to a finite subset of tree space is continuously differentiable within each cell of 
the corresponding subdivision. We use this subdivision to establish two iterative meth- 
ods for producing sequences that converge to the Frechet mean: one based on Sturm's 
Law of Large Numbers, and another based on descent algorithms for finding optima of 
smooth functions on convex polyhedra. We present properties and biological applications 
of Frechet means and extend our main results to more general globally nonpositively 
curved spaces composed of Euclidean orthants. 
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Introduction 

The development of statistical methods for studying phylogenetic trees, and in particular 
the search for meaningful notions of consensus tree for phylogenetic data, has been of con- 
siderable importance in biology for four decades. Starting with the problem as posed by 
Adams pQ, a great deal of research has been done, and a myriad of definitions proposed, 
relating to consensus trees in phylogenetics; see [13] for an excellent overview. The prob- 
lem has been confounded by the combinatorial nature of the trees themselves. According to 
Cranston and Rannala [18], "Phylogenetic inference has long been troubled by the difficulty 
of performing statistical analysis on tree topologies. The topologies are discrete, categorical, 
and non-nested hypotheses about the species relationships. They are not amenable to stan- 
dard summary analyses such as the calculation of means and variances and cause difficulties 
for many traditional forms of hypothesis testing." Other papers share concerns about issues 
such as these [9|. [27] . 

The introduction by Billera, Holmes, and Vogtmann of phylogenetic tree space |12] opened 
statistical analysis of tree-like data to a wide and computationally tractable variety of tech- 
niques |28j . Tree space, with its geodesic distance, is a globally nonpositively curved (abbrevi- 
ated to global NPC) space, and as a result it has convexity properties that imply uniqueness 
of means as well as other important statistical and geometric objects, while also giving a 
framework for effective computational methods to calculate these objects. One of the major 
uses of the convexity properties was the discovery by Owen and Provan [40] of a fast algorithm 
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for computing geodesies in this space (see Section [JJ for this algorithm as well as the back- 
ground tree space geometry necessary to state it). Chakerian and Holmes [T7] subsequently 
showed that phylogenetic tree space provides an excellent platform for implementing several 
distance-based statistical techniques, and Nye [36] has shown how this space can be used to 
perform principal component analysis on tree data. 

Perhaps the two most fundamental concepts of interest in statistical analysis of data are 
that of sample mean (or average) and its associated variance. The basic goal of this paper 
is to demonstrate the computational effectiveness of certain notions of statistical mean and 
variance for probability distributions on tree space. The average that we use is the Frechet 
mean, or bary center: the point in tree space that minimizes its sum of squared geodesic 
distances to the sample points (Section [2]). Our decision to use this definition is motivated 
by work of Sturm [U], who identified the Frechet mean as a theoretically rich statistical 
object associated with sampling from a specified distribution on a global NPC space (see 
Theorem 12. 4p . Frechet means in tree space and the algorithm for computing them that arises 
from Sturm's work (Algorithm I2.5[) have been independently developed by Bacak [8]. 

Specifically, our principal theoretical contribution lies in the discovery of polyhedral struc- 
ture governing the variation of geodesies in tree space as one endpoint varies (Section [3]). To 
be more precise, if T is a fixed point in tree space, then in appropriate coordinates on tree 
space, the set of points whose geodesies to T share the same combinatorics comprise a convex 
polyhedral cone called a vistal cell (Theorem I3.23[) . and the vistal cells constitute a polyhe- 
dral subdivision of tree space (Theorem I3.27p . This metric combinatorics has direct roots 
in surprisingly similar statements for boundaries of convex polyhedra |34| . the parallel being 
unexpected because boundaries of convex polyhedra are positively curved, in contrast to the 
negative curvature of tree space. 

Metric combinatorics of tree space, particularly its polyhedral nature, combines with gen- 
eralities on nonlinear optimization in NPC spaces to give a second iterative method converging 
to the mean (Algorithm 14. 4p via descent procedures. The crucial observations are that the 
variance function has a unique local minimum on tree space, is continuously differentiable on 
each orthant of tree space, and has a simple formula within the interior of each vistal cell. 

Means in tree space have subtle, sometimes peculiar properties that inform our particular 
motivations (Section [5]), which come primarily from biological and medical applications, al- 
though we expect these observations to impact other fields where distributions of metric trees 
naturally appear. Evolutionary biology, for instance, considers honest phylogenetic trees, 
each representing a putative evolutionary history of a set species or genes (Example 15. 5p . In 
medical imaging, trees can represent blood vessels in human brain scans (Example I5.6|) . 

Some the theory in Sections HHH extends to arbitrary global NPC spaces, and all of it 
extends to global NPC orthant spaces (Section [6]). For the first iterative procedure (Al- 
gorithm 12.51) and the rest of Section as well as for the shortest path combinatorics in 
Section [TJ this means working in arbitrary global NPC spaces (Sections 16. 1H6.2|) . For the 
second iterative procedure (Algorithm I4.4p and the rest of Section [4l as well as for the metric 
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combinatorics in Section [21 this means working in piecewise Euclidean global NPC spaces that 
are formed by gluing orthants together by rules similar to — but substantially more general 
than — those defining tree space (Section 16, 3|) . The extensions suggest exciting new research 
in applying both statistical methods and numerical nonlinear programming techniques to a 
wide variety of problems. Important note: readers interested in the generality of abstract 
orthant spaces or arbitrary NPC spaces are urged to begin with Section [6J which sets up the 
notation and concepts in Sections [THU from that perspective. Hence such readers can avoid 
checking the proofs in the earlier sections twice. 

Acknowledgements. Our thanks go to Michael Turelli and Elen Oneal for help with ref- 
erences and discussions on biological applications, to Antonis Rokas for kindly providing the 
yeast data set, to Sean Skwerer and Ipek Oguz for preparing the brain artery dataset, and 
to Dennis Barden for comments on a draft of the paper. EM had support from NSF grants 
DMS-0449102 = DMS-1014112 and DMS-1001437. MO was partially supported by a des- 
Jardins Postdoctoral Fellowship in Mathematical Biology at University of California Berkeley 
and by the U.S. National Science Foundation under grant DMS-0635449 to the Statistical and 
Applied Mathematical Sciences Institute. Much of this research was facilitated by and carried 
out at the Statistical and Applied Mathematical Sciences Institute (SAMSI) as an outgrowth 
of the 2008-2009 program on Algebraic Methods in Systems Biology and Statistics. 

1 Tree space and the geodesic algorithm 

In this section, we describe the space of phylogenetic trees introduced by Billera, Holmes, 
and Vogtmann j!2j . as well as a distance and characterization of geodesies in this space. 

1.1 Phylogenetic tree space 

A phylogenetic n-tree T, or simply an n-tree, is an acyclic graph T with edge set S = £t 
whose leaves (degree 1 nodes) are labeled with index set L = {0, 1, . . . , n}, and whose interior 
vertices have degree at least 3. (The label is often referred to as the root of T, although 
that is not relevant in this paper.) The maximum number of edges in an n-tree is 2n — 1. 
Each edge e of T is assigned a nonnegative length \e\T, or |e| in case the ambient tree is 
clear. Removal of any edge e from T determines a unique partition of the leaves of T into two 
subsets X e and X e ; the pair X e \X e is called the split associated with e. A key property of 
splits in trees is that the splits X e \X e and Xf\X f of any pair of edges e and / are compatible 
if one of the sets X e n Xf, X e n Xf, X e n Xf, or X e n Xf is empty. A set S of splits is 
called compatible if every pair of edges in S is compatible. It turns out Theorem 3.1.4] 
that any compatible set of splits on L corresponds to a unique tree, and so from now on we 
identify a tree T by simply giving the splits and edge lengths for each edge in T. 
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A tree T can have an edge e whose associated length |e|y is 0. This corresponds to the 
edge e having been contracted in T. Denoting the set of edges of T with nonzero length 
by Sji allows the identification T ~ T' between two trees T and T' whenever (i) = ££, 
and (ii) their nonzero edge lengths are equal. 

Example 1.1. Two 5-trees are depicted in Figure [TJ For simplicity, we only give the splits 
and edge lengths for the three internal edges in each tree. The six internal edges are distinct 
since they have different splits, and the splits within each tree are compatible. The only 
compatible pairs between the two trees, however, are {ei,ee}, {e2,es}, and { 63,64}. 



1 




T T 



ei :{0,l,2,5}|{3,4} e 4 : {0, 1, 4, 5} |{2, 3} 

splits e 2 :{0,3,4,5}|{l,2} splits e 5 : {0, 1, 2, 3} |{4, 5} 

e 3 : {0,5} |{1,2,3,4} e 6 : {0, 1} |{2, 3, 4, 5} 

Figure 1: An example of two 5-trees. 



The tree space T n introduced by Billera, Holmes, and Vogtmann [12] is the space of all 
phylogenetic n-trees. It is obtained by representing each tree T £ T n on edge set £ by a 
vector in the Euclidean orthant 0(T) = 0{£) = whose coordinate values are equal to 
the corresponding lengths of the edges of T. As above, trees T and T' are identified between 
orthants whenever the associated trees satisfy T ~ T'. This makes T n a union of (2n — 1)- 
dimensional orthants — called maximal orthants — whose interiors are disjoint and which are 
identified along their boundaries through the equivalence ~ given above. The Euclidean 
length of a path in T n is the sum of the Euclidean lengths of its restrictions to the maximal 
orthants. This length endows T n with the metric d in which d(T, T') is the infimum of the 
Euclidean lengths of the paths from T to T' . Note that d(T,T') < oo, since the space T n is 
path-connected: any two points can be joined by straight line segments through the origin. 
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1.2 Geodesies in tree space 

Billera, Holmes, and Vogtmann [TT] show that tree space is globally non-positively curved (a 
global NPC space), equivalently known in this context as CAT(O). Among other things, this 
implies that shortest paths in tree space are unique, so they are unambiguously referred to 
as geodesies. This section summarizes the key results of |38j and [3D], which investigate the 
structure of geodesies in tree space and provide an 0(n 4 )-algorithm — the GTP algorithm — to 
find shortest paths. For notation, if T is a tree with edge set £ and A C £, then we write 

\\A\\t = 




and use \\A\\ if the tree T is clear. This means that \\A\\ = \e\ whenever A = {e}. 

We express a geodesic with endpoints X and T as a parameterized curve 7 : [0, 1] — > T n 
with 7(0) = X and 7(1) = T. If an edge e lies in both X and T, then it lies in every tree on 
the path 7, with length uniformly changing between the two terminal values \12\ Section 4.2]. 
We therefore focus first on the case when X and T have no internal edges in common, and 
ignore the lengths of the pendant edges (those containing leaves) in the distance computation. 

Each geodesic in tree space is a sequence of straight line segments, called legs, because tree 
space is piecewise Euclidean. Each leg is contained within a single orthant 0{Ei UFj), where 
Ei C Ex and F{ C £ T . The precise properties of the sets E{ and Fi making up these legs were 
determined in [38] . In particular, define the support (A, B) = {{A\, . . . , A^), (B±, . . . , B^)) of 
a geodesic 7 to consist of a pair consisting of a partition A\ U • • • U A^ of Ex and a partition 
B\ U • • • U -Bfc of £t such that the following property holds: 

(PI) for each i > j, the union Ai U Bj is compatible. 

The geodesic 7 has legs in OiEi U Fi), where 

Ei = A i+1 U • • • U A k 
and Fi = B 1 U • • • U Bi. 

The individual pairs (Ai,Bi) are the support pairs for the geodesic. 

Whether the shortest piecewise-linear path having these legs actually forms the geodesic 
between X and T is determined by the following two properties for (A, £>). 

11^1 II 11^2 1| || 

(P2) < < • • • < This is called the ratio sequence for (^4, £>). 

Il-oill II-D2II ll-Dfcll 

(P3) For all (Ai,Bi) and partitions I\ U I2 of Ai and J\ U J2 of Bi such that I2 U Ji is 
compatible and at least one partition is nontrivial, the inequality -p^jy > |^| holds. 

The properties (P1)-(P3) determine the geodesic between X and T, as well as the algebraic 
description of this geodesic in the following result. 
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Theorem 1.2 ( |4Ul Theorem 2.4]). Let X and T be trees in T n having no internal edge in 
common, and let (A,B) be a support for X and T satisfying properties (P2) and (P3). Then 



the unique geodesic 7 




(7(A) ' 


< A < 1) 


between X and T can 


be represented by legs 






7 (A) : 


A < ||Ai||" 
1-A - l-Bil 




for i = 0, 




y = < 




7(A) : 


Pill < A 
HSill - 1-A 


/ \\Ai+i\\~ 
~ Il^+ill. 


for i = 1, 


..,k-l, 




< 


7(A) : 


A \\A k \\ 
1-A £ \\B k \\ 




for i = k, 





where the points on each leg "f are associated with tree Ti having edge set 

Bi U ■ ■ ■ (J 5* U Ai+x U ■ ■ ■ U A k 

and edge lengths 



|e|r for e £ Aj 



6 T. 



A||B,-||-(1-A)||A,- 



The length 0/7 is 



^JJ] \ e \x foreeBj. 

L( 7 )= (||Ai|| + ||Si||,...,||A fc || + ||S fc || 



Example 1.3. Figure [2] shows an example of the geodesic 7 between the trees T and T' in 
Figure [1] in Example 11.11 The associated support (A,B) for 7 has A = {{e2, 63}, {ei}} and 
& = {{ e 6}) { e 4i e 5}}i an d the coordinates of seven equally spaced trees in 7 are given in the 
table. The length of this path, as given by ((TJ), is 



L(l)= (Il{e2,e3}|| + ||{e 6 }||,||{ ei }|| + ||{e 4 ,e 5 ] 



15>/2. 



The extension to the case where X and T have a nonempty set C of common edges was 
addressed in [401 Section 4]: remove the common edges between X and T from each tree, and 
then find the paths between the remaining disjoint forests, matching trees by their leaf sets. 
The common edges are then placed into the path with the length of each such edge being 



(l-A)|e| x + A|e| T . 



The length of 7 is now 



\A X \\ + piH, . . . , \\A k \\ + \\B k \\ ^2(\e\ T - \e\ x ) 2 J 



(2) 



(3) 



This allows pendant edge lengths to be taken into account. It also includes the special case 
when one of the trees has an edge e that is not in the other tree, but is compatible with all 
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Figure 2: Seven trees in the geodesic 7 between T and T' . 

edges of that tree. In this case, we can simply add e to the other tree with length, and 
then treat it as a common edge. 

To be able to work more easily with trees having common edges, we extend Theorem 11.21 
to the case where X and T have common edges, and in the process simplify the description 
of the geodesic considerably. To do this, we use the following three important conventions. 

(a) An edge is never compatible with itself; thus the pairs of identical edges in X and T 
must appear in the same support pair (At, Bj). 

(b) \\Ai\\ = — y^X^eeAi I^t f° r an y se ^ Ai of edges of X in common with T. 

(c) We extend the notation for support pair by adding the additional sets 

A) = B = A k+ i = B k+ i = 
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and define 



Poll 
Pol 



-oo and 



IB 



fc+i I 



OO. 



With these conventions we can restate the unified result. 

Theorem 1.4. Let X andT be any two trees inT n (not necessarily disjoint), and let (A,B) be 
a support for X and T satisfying (P2) and (P3). The unique geodesic 7 = {7(A) : < A < 1} 
from X to T has legs 



7 



7(A): 



I A: 



< 



A 



< 



\A 



i+l\ 



for i = 0, . . . , k, 



\Bi\\ 1 — A 

The points on each leg 7* are associated with the tree Tj having edge set 

Bi U • • • U Bi U A i+ i U • • • U A k 



(4) 



and edge lengths 



(1-A)P,||-A||£ 7 | 



[A; 



\e\ x ifeeAj 



X\\B 



^MAA ]eW ifeeB . 



\Ba 



The length of 7 is 



L( 7 )= (||Ai|| + ||Bi||,...,||A fc || + ||5 fc | 



(5) 



(6) 



Proof. The presentation in this theorem matches that of Theorem ll.2l except for the treatment 
of the common edges of X and T. Consider any edge e common to X and T. The definition 
of a support ensures that e lies in both Ai and Bi for some i. Further, by convention the ratio 
||e||x/||e||T is negative, so (P2) is never satisfied unless all of the common edges are placed 
at the front of the ratio sequence. This also means that for any A > 0, each common edge is 
contained in some B{ for the computation of its edge length at that point along the geodesic. 
Furthermore, since the common edges are mutually compatible with each other, they are 
placed in different support pairs whenever the ratios |e|x/|e|T differ. It follows that the 
common edges are always grouped in support pairs (A{,Bi) having HAH/H-BiH = — Mx/Mt 
for any e in that support pair. 

Now consider the length of a common edge e in leg 7* of the path. By ([5]), 



A 115, 



leta 



;i-A)p, 



\Bi 



A + (1 - A) 



\e\T 



A-(l-A) 



\e\x 
\eW 



\e\x = \\c\t + (1 



A)|e|x, 



|e|T 



which matches ([2]). 
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Next look at the term in §6§ corresponding to a support pair (A^, Bi) of common edges: 



(ll^ll + ll^-ID 2 = ( jfM + 1) W = E f 1 - W)* Ht = E (M* - M*) 2 

\ II ill / egBj V NT, 

Summing this over all such pairs (Ai,B{) yields 



E 1 



|e|r - \e\x) 2 



where C is the set of common edges. This matches the corresponding term in ([3]). 

Finally, take the case where an edge e lies in only one of the sets X and T, but is 
compatible with all edges in the other set. As above, add e to the other set with length 0, 
and treat these as common edges. Thus if e lies in X, then it appears in a support pair 
(Ai, 0) with Ai a set of edges compatible with all of T; and if e lies in T, then it appears in 
a support pair (0, BA with Bi a set of edges compatible with all of X. Since the ratios of 
these pairs is either or oo, respectively (since ||0|| =0), these pairs appear before and after 
any nontrivial pairs, respectively. Further, the edge component values and path length are 
as indicated in ([5]) and ([6]), respectively. This completes the proof of the theorem. □ 

We end the section by giving a canonical representation for any geodesic. 

Lemma 1.5. Any geodesic 7 can be represented by unique support (A,B) satisfying 

■A" <<£!<...< Mil. (7) 



11-^1 II II^H H-B 
This support is called the minimal support. 

Proof. This is the content of the remark in [401 Section 2.3]. The basic argument is as follows. 
Any support (A 1 , B') 7^ (A, B) of form ([7]) results in a different geodesic, since by they 
have different legs. On the other hand, for any representation of 7 having equalities in the 
ratio sequence, combine the respective sets in every equality subsequence. The resulting 
support continues to satisfy (P2), and hence there is a shortest piecewise linear path from 
X to T through the prescribed orthants. Further, from ((HJ) it follows that the length of this 
path equals that of 7, and hence defines the unique geodesic 7. □ 

Remark 1.6. Theorem 11.41 positions the support pairs corresponding to edges compatible 
with both trees into (|7|) as follows. 

(i) The set Nx of edges of X that are not in T but are compatible with all edges of T is 
the set Ao, with Bq = and ratio p^jj = = —00. 

(ii) The set Nt of edges of T that are not in X but are compatible with all edges of X is 
the set Bk with Ak = 0, and so its ratio is ||^|| = ttSt = 0. 
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(iii) Any edge e that lies in both X and T (and hence has positive length in both sets) 
appears in both sets of some support pair (Ai, Bi), and so the ratio is — oo < M4 < 0. 

(iv) All other support pairs have jj^jj- > 0, so both sets in the support pair are nonempty. 

The ordering of the support pairs in (i)-(iii) has no effect on the structure of the geodesic 
between X and T, so for the remainder of the paper we take the ratio sequence for a geodesic 
to represent only the positive ratios in the sequence. 

2 The mean and variance in tree space 

Given a finite point set T = {T 1 , . . . , T r } of trees in T n , the mean of T, alternatively known 
as the Frechet mean or barycenter, is the tree T £zT n that minimizes the sum S(X,T) of the 
squares of the distances from X to the points in T. The variance of T is S(X,T)/r. Since 
r is constant throughout the following discussion, we abuse notation and henceforth refer 
to the variance as simply S(X,T). (The reader interested in Frechet means of more general 
distributions than finite-support equal- weight measures should read Section [672] before contin- 
uing.) The motivation for considering these notions of mean and variance as the appropriate 
statistical objects in tree space was given by Sturm [17], who established the mathematical 
foundations for probability theory on global NPC spaces. This section reviews the required 
basics of Sturm's geometric methods in the context of tree spaces. (Readers interested in a 
more general treatment for arbitrary global NPC spaces should read Section [6] now.) 

2.1 The variance function 

Let T E T n be a fixed tree, and consider the geodesies from T to a variable tree X E T n - 
The tree X can be thought of as a vector in M^, whose coordinates are express using the 
corresponding lower-case letter x. If the geodesic from X to T has support pair [A, B) as in 
Theorem 11.41 then the squared distance d(X,T) 2 from X to T is expressed as the function 

S T (x) = ^T(\\x At \\ + \\B t \\) 2 (8) 

i=l 

in which x J ± i is the vector whose coordinates are restricted to edges in A{. It follows that for 
a set T = {T 1 , . . . , T r } C T n of trees, the variance function S(X, T) can be written 

r 

S(x):=S(X,T) = Y,S Te (x). (9) 

i=\ 

Thus the mean T can be thought of as the point x* that minimizes S(x) over x E Ti- 



ll 



To state the next result, a real- valued function / : T — > K on a metric space T is strictly 
convex if / o 7 is a strictly convex real-valued function on M for all geodesies 7; that is, if 



/(7(A)) < (1 - A)/( 7 (0)) + A/ ( 7 (1)) whenever < A < 1. 



Proposition 2.1. XTie variance function S(x) is strictly convex as a function on T n . Con- 
sequently, the mean is the unique local minimum of S(x) in T n . 



The differentiability of the variance function S is critical to the construction of gradient- 
descent methods for minimizing S. This in turn depends on the differentiability of the 
individual functions St in Q. Identifying the geodesic from X to T by its support (A,B), 
Eq. ([8]) yields the partial derivatives of St with respect to each of the coordinates x e : 



where A4 is the set containing e. This is well-defined whenever x lies in the interior of its 
maximal orthant, although the functional form of (IIPD depends upon the combinatorial type 
of the geodesic, and in particular on (A,B). It turns out, however, that throughout the 
interior of any maximal orthant the function S is continuously differentiate. 

Theorem 2.2. The variance function S(x) is continuously differentiate on the interior of 
every maximal orthant O. 

Proof. Prom ^ it suffices to show that the function (|1U|) is continuously differentiate on 
the interior of O. By Lemma [1.51 the geodesic between X and T can be represented uniquely 
by support (A°,B°) satisfying ([7]). Any other support (A,B) for this geodesic consists of 
a sequence of sets partitioning the A® and into equality subsequences, with the ratios 
HA? II /II -Bj II in the equality subsequence equal to the corresponding ratio ||^4- ) ||/||-B- ) || of the 
sets from which they were partitioned. But this means that the ratio H-BjU/Hx^ ||, and hence 
8St(X) I 'dx e , is the same regardless of which representation we choose for the geodesic. It 
follows that the partial derivatives are continuous everywhere in the interior of O. □ 

2.2 Sturm's algorithm 

The orthant structure of tree space T n prevents the averaging of finite point sets using the 
standard Euclidean centroid. The following serves as an approximate replacement, intro- 
duced by Sturm [47} Definition 4.6] in the context of probability theory on arbitrary globally 
nonpositively curved spaces. 



Proof. [El Proposition 1.7]. See also Example 16.31 



□ 




(10) 
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Definition 2.3. For a set X 1 ^ 2 , ... of points in T n and an index k, the inductive mean 
value of X 1 , . . . , X k is the point 

E * 

£=l,...,fc 

defined by setting /xi = X 1 and for £ = 2, . . . , k, letting m be the point jyg = 7(1/^) that is 
l/l along the geodesic 7 from fii-i to 71 = X . 

Note that if all of the X 1 lie in the same orthant, then the inductive mean value of 
X 1 , . . . ,X k in fact equals the standard centroid of X 1 , . . . ,X k . For general points in T n , 
though, the inductive mean may not be the Frechet mean; it may in fact give different points 
for different orderings of the points X 1 , . . . ,X k . Sturm goes on to prove \A7\ Theorem 4.7] 
the following strong law of large numbers for the Frechet mean. 

Theorem 2.4. Fix a set {T 1 , . . . , T r } C T n of trees. If X 1 , X 2 , ... is a sequence of points 
sampled uniformly and independently from {T 1 , . . . ,T r }, then with probability 1, the sequence 
of inductive mean values fij, /X2, • • ■ approaches the mean T of {T 1 , . . . , T r }; that is, 

1 E 

e=i,...,k 

To be precise, Sturm shows that if we take the inductive mean Hk as a random variable 
dependent on the sampling of the points X e , then the distance d{jj,k,T) from to the true 
mean T has expected value bounded above by S(T,T)/k. This gives us a way of estimating 
the Frechet mean T through a sequence of inductive means fii, fj,2, ■ ■ ■ obtained by randomly 
sampling trees from the set {T 1 , . . . ,T r }. 

Algorithm 2.5 (Sturm's algorithm). 

INPUT a set {T 1 , . . . , T r } of trees in T n 
positive integers K and N 
positive real number e 
output fif- = k th approximation of the mean tree 

initialize choose a tree T € {T 1 , . . . , T r } uniformly at random 
set \i\ := T 
set k := 1 

while k < K or pairwise distances d(fij , for k — N < j, £ < k are not all < e 
DO choose tree T S {T 1 , . . . , T r } uniformly at random 
set 7 := the geodesic from T to fi^ 

set fik+i ■= 7l/(fe+i) 
set k := k + 1 

END WHILE-DO 

return the k th approximation of the mean tree 
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Remark 2.6. The choice of stopping criterion involves two parts. 

(i) Running the algorithm a specified initial number K of iterations guarantees an upper 
bound of K r +l S(T, T) on the expected distance of the final tree Hk to the mean T. 

(ii) Comparing the final N sample means serves as a proxy for testing that the sample 
means act like a Cauchy sequence for N steps. 

Thus in principle, proper settings for K, N, and e could be used to set confidence intervals 
on the distance d(fii,T) by using Sturm's result. This would involve a more sophisticated 
statistical analysis, which we did not undertake in this paper. For our experiments, such 
as those reported in Examples 15.61 and 15.51 we chose K = 1000 000 and N = 5. We either 
fix e to be some value, as in Example 15.61 or we calculate it based on the square root of 
the sample variance of the sample mean tree fi^ r . In Example 15.51 for instance, we use 
e = 10~ 5 (S(/i5 r , T)) 1 / 2 . If a function is utilized to determine any of these parameters, then 
it should depend on the size r of the set of trees to be averaged. 

Remark 2.7. We have made software implementing this algorithm freely available |39j. 

3 The combinatorics of geodesies in tree space 

This section investigates the combinatorial structure of geodesies in T n and their relationship 
to the variance function. To be more precise, fix a source^ tree T 6 T n . The shortest 
path from an arbitrary tree X S T n to T has a "combinatorial type", determined through 
Theorem 11.41 by the sequence of orthants that it passes through, or more specifically the 
support pair (A, B) associated with the geodesic. This combinatorial type can change, even 
when X has the same topology, depending on the precise values of the lengths of the edges in 
X. We are interested in the partition of T n — callecothe vistal subdivision of T n — into regions 
for which the geodesies to the fixed tree T have the same combinatorial type. 

We begin by describing a simple change of coordinates, the squaring map, and character- 
izing the faces of maximal dimension in the vistal subdivision (Section 13. ip . In particular, 
Propositions 13.31 and 13.41 establish that after applying the squaring map the vistal facets are 
polyhedral regions that cover tree space but have disjoint interiors. Next we provide a simple 
description of the faces of lower dimension in these polyhedra (Section I3.2[) . Finally, we prove 
that the vistal facets constitute the maximal cells of a polyhedral subdivision of tree space, 
called the vistal polyhedral subdivision (Section I3.3p . 

lr The terminology comes from studies of polyhedral unfolding [34], where the unmoving point T emits a 
signal at unit speed, so at time d(X,T) the wavefront passes through X. 

2 Our use of the term "vistal subdivision" here differs from [341 Conjecture 9.6]: vistal facets in Definition ^. II 
here are analogous to cut cells in |34l Definition 5.4]. In contrast, the equivalence relation in [34] declares two 
points equivistal when their vistal subdivisions — in the sense of Theorem 13.271 — are combinatorially the same. 
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3.1 Vistal facets 



Definition 3.1. Given a source tree T G T n , a maximal orthant O C 7n, and a support 
(A,B), let V(T,0;A,B) be the closure of the set of trees X G for which the geodesic 
joining X to T has support (.A, #) satisfying (P2) and (P3) with strict inequalities. A 
previstal facet is any nonempty set V(T, O; A, B) of this form. 

The description of V(T, O; A, B) becomes linear after a simple change of variables. 

Definition 3.2. The squaring map T n ^tT n acts on x G 7^ C by squaring coordinates: 

(x e | e G E) i— )• (£ e | e G -E 1 ), where £ e = x e . 

Denote by T 2 the image of this map, and let £ e = x 2 denote the coordinate indexed by 
e G E. The image of an orthant in T n is then the equivalent orthant in T 2 , and the image 
of a previstal facet V(T,0;A,B) in T 2 is a vistal facet denoted V 2 (T,0;A,B). With this 
change of variables, ||£|| = ^ e eS^ e- 

The squaring map induces on the variance function S a corresponding pullback function 

S 2 (0 = S(v^), where (^l)e = (H) 

Since the variance function S'(x) is continuous on T n with a uniquely attained minimum by 
Proposition 12.11 and continuously differentiable on the interior of each maximal orthant by 
Theorem 12.21 the same properties hold for S 2 . Thus we can apply steepest descent methods 
after squaring just as we would beforehand. This is further explored in Section HI 
Theorem 11.41 implies a nice description of the vistal facets of . 

Proposition 3.3. The vistal facet V 2 (T, O; A, B) is a convex polyhedral cone in defined 
by the following inequalities on £ G M. E , where all norms ||-|| are to be interpreted as \\-\\t- 

(O) £ G O; that is, £ e > for all e G E, and £ e = for e £ £, where O = M+. 
(P2) ||B m || 2 < ||^ || 2 Yl for alii = l,...,k-l. 

(P3) \\Bi \ J\\ 2 ^2 > \\J\\ 2 ^2 ^ e f or a M i = 1) • • • j k an d subsets I C Ai, J C Bi such 

e£Ai\I e£l 

that I U J is compatible. 

Proof. For a vector x = (x e \ e G E) G O to lie in V(T, 0; A, B), the tree X must be in 
an orthant of T n and satisfy properties (P2) and (P3). The orthant condition immediately 
implies the nonnegativity conditions in (O). The inequalities corresponding to (P2) are 

WAW < \\A i+1 \\ 

~ ll-^j+lll 

Squaring, cross-multiplying, and substituting £ e for x 2 yields the corresponding linear inequal- 
ity in V 2 (T,C; A,B) in T 2 . The inequalities for (P3) are obtained in the same manner. □ 



15 



Proposition 3.4. The vistal facets are of dimension 2n — 1, have pairwise disjoint interiors, 
and cover T 2 . A point £ G 7~ 2 lies interior to a vistal facet V 2 (T, O; A, B) if and only if the 
inequalities in (O), (P2), and (P3) are strict. 

Proof. The vistal facets cover 7^ by definition: T G V 2 (T,0;A,B) if the geodesic from 
X G O to T has support (A,B). The second statement follows by definition and by standard 
properties of convex polyhedra presented as solutions to systems of linear inequalities. □ 



3.2 Vistal cells 

Henceforth in this section we focus our attention on the squared tree space 7^ and its 
expression as a union of polyhedral vistal facets as given by Proposition 13.31 This subsection 
concerns faces of vistal facets, including compact characterizations thereof. 



3.2.1 Signatures and vistal cells 

Definition 3.5. Fix a source tree T G T n , a (not necessarily maximal) orthant O C T n: and 
a support (A,B). A signature associated with the support (A,B) is a length k — 1 sequence 
S = (ax, . . . , Ofc-i) of symbols Oi G {= , <}. The previstal cell defined by O, A, B, and S 
is the set V(T, O; A, B; S) of points X in T n for which the ratio sequence for (A,B) at the 
point X has the following specific form: 

11^1 II P2II ll^fc-i|| \\A k \\ 



ll-^ill ll^ll " ||-Bfc-l|| " H-Bfc,, 

The vistal cell V 2 (T, O; A, B; S) C 7^ is the image of V(T, O; A, B; S) under squaring. 

Remark 3.6. Vistal cells are convex polyhedra that need not be bounded, and as such they 
might not be topological cells. However, the interior of a convex polyhedron is a topological 
cell, so every vistal cell is the closure of a topological cell. 

Lemma 3.7. The dimension of the vistal cell V 2 (T,0;A,B;S) is at most dim(O) — m(S), 
where m{S) is the number of "=" components in S. The vistal cell V 2 (T,0;A,B;S) is full- 
dimensional if and only if there exists a point X G V(T, 0; A, B; S) satisfying the following 
two properties. 

(VI) For each i = l,...,k-l, JM = J|^{ if Oi is "=" and M4 < J^±4 if a { is "<". 
(V2) The inequalities in (P3) are satisfied strictly. 
Proof. This follows from standard polyhedral theory, as treated in [49J . for instance. □ 



Proposition I3.3I implies that (i) vistal cells are faces of vistal facets, and that (ii) vistal 
facets are vistal cells for which O is maximal and the signature contains only "<" symbols. 
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What we prove here is that all faces of vistal facets can be represented as vistal cells, and 
that under some simple conditions on (A,B), Definition 13,51 provides a canonical description 
of each vistal cell. We start by determining all supports and signatures associated with the 
geodesic 7 from T to a particular point X. By Lemma [L5l the geodesic 7 can be represented 
by a unique minimal support (A,B) satisfying ([7]): 



Any other support (A 1 , £>') of 7 corresponds to a ratio sequence in which at least one ratio 
|| || /|| Si || is replaced by a ratio subsequence formed from a partition of At and Bi, with 
equalities between all terms. Any ratio subsequence for which X continues to satisfy (P3) 
together with equalities between terms of the ratio subsequences constitutes a valid support 
for 7. We next give a specific method for determining all such support sequences. 

3.2.2 Incompatibility graphs and equality subsequences 

For trees in [40] it was shown how condition (P3) for a support pair (Ai, Bi) can be rephrased 
in terms of conditions on a special node-weighted graph derived from the compatibility re- 
lations between X and T and their coordinate values. We summarize the technique here. 
Denote the coordinates of X and T by X = (x e \ e G Ex) and T = (t e \ e £ St), and let 
£ e = x\ and 77 = t 2 e be their squared coordinates. 

Definition 3.8. The incompatibility graph G(Ai,Bi) between Ai and B{ is the weighted 
bipartite graph with vertex set Ai U Bi and an edge from a G Ai to b G Bi whenever a and b 
are incompatible. The weight of each vertex a G X is £ a = £ a / YleeA- £e> and the weight of 
each vertex b G T is f& = Tfc/X^egB- r e- A (vertex) cover for G(Ai,Bi) is a set C C Ai U -R; 
having the property that every edge of G(Ai,Bi) has at least one endpoint in C. The weight 
of C is the sum of the weights of its vertices. 

Lemma 3.9 ( [40|, Section 3]). Property (PS) holds for support pair (Ai,Bi) if and only if 
every cover of G(A{, Bi) has weight > 1. □ 

By Lemma 13.91 testing a support pair (Ai, Bi) for property (P3) is equivalent to showing 
that the min weight cover for G(Ai,Bi) has weight 1. The problem of finding the minimum 
cover in G(Ai, Bi) in turn can be reduced to solving a max flow problem (see [2], Section 12.3) 
on a specially defined flow network F(Ai,Bi). To construct F(Ai,Bi), start with G(Ai,Bi), 
attach a source s to the Aj-vertices of G(Ai, Bi) and a sink i to the -Bj-vertices of G(Ai,Bi), 
and direct all edges from s toward t. Set the capacity of each edge (s, a) to £ Q , set the capacity 
of each edge (b, t) to ffc, and set the capacities of edges in G(Ai,Bi) to 00. The Max-Flow-Min- 
Cut Theorem implies that the value of the maximum (s, f)-flow / for F(Ai, Bi) is equal to the 
capacity of a minimum capacity of an (s, f)-cut K in F(Ai,Bi), which in turn corresponds 
to a minimum weight cover C for G(Ai,Bi). Thus the condition in Lemma 13.91 for G(Ai,Bi) 




< ... < 
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is equivalent to the property that the max flow in F(A{, Bi) is > 1. The precise relationship 
between max flows in F(Ai,Bi) and min covers in G(Ai,Bi) is crucial to determining the 
possible ratio subsequences that can replace a term in 0, and we clarify this 

relationship below. 

Example 3.10. Figure [3] demonstrates this for a hypothetical support pair (Ai,Bi) with 
Ai = {x 1 ,x 2 ,x 3 ,X4,X5,xe,x 7 ,xs} and B { = {tiMMMMMM}, £a, n and G(Ai, Bi) as 
given in Figure El^a). Figure E^b) depicts the associated flow graph F(Ai,Bi) and max flow. 
For simplicity, the weights are not normalized, so that all numbers are scaled by 23, the sum 
of the weights. This flow has value 23, which means that the pair {Ai,Bi) satisfies (P3). 

3.2.3 Residual graphs and ratio subsequences 

Now consider the problem of determining the possible ratio subsequences replacing a term 
|| || /|| in the ratio sequence of a minimal support for X and T. We use the optimal flow 
conditions on F(Ai,B{) to do this. Recall that here (Ai,Bi) also satisfies (P3), so that the 
max flow / on F(Ai,Bi) has value 1. The associated minimum weight cover for G{Ai,Bi) 
can then be obtained from this flow. To do this, we define another auxiliary graph. 

Definition 3.11. The residual graph G\ with respect to / has 

(a) all edges of G{Ai,B{), directed as in F(Ai,Bi), and 

(b) all edges e of F{Ai, Bi) — but in the reverse direction — where f e > 0. 

An (s,t)-cut in G\ is any partition (H,H) of the nodes of G\ having the property that no 
edge of G\ goes from H to H. 

The definition of residual graph is based on the structure of F(Ai,Bi) and the fact that 
the flow / saturates (is at capacity on) all edges adjacent to either s or i. The Max-Flow- 
Min-Cut Theorem states that every (s, i)-cut in G\ corresponds to a cut of capacity 1 in 
F(Ai,Bi), which in turn corresponds to a cover of weight 1 in G(Ai,Bi). This leads to the 
following result. 

Lemma 3.12. Let (H,H) be a cut in the residual graph G\. Then the sets I\ = H n Ai, 
J i = H n Bi, I2 = H fl Ai, and J2 = H n Bi have the property that jp^jy = ||^| can replace 
P^jj in and the resulting sequence still satisfies (P2) and (PS). 

Proof. By Definition 13. Uf a) all edges of G(Ai,Bi) are in G\, so in particular there can be 
no edge from any element in I2 to any element in J\. Thus I2 U J\ is compatible. Further, 
by Definition 13.11T b) there are no edges of G\ from H to H, so the flow is conserved in H, 
and hence in H. This implies ||/i|| = ||Ji|| and H/2II = ||^2||) and thus the ratios are equal. 
Finally, since the flow / restricted to each of the subgraphs F(I\, J\) and F(l2, J2) continues 
to saturate the edges adjacent to s and t, property (P3) continues to be satisfied on the 
replacement support pairs (Ji, J±) and (h,J2)- □ 
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(c) Residual graph G r with contracted nodes circled 
(thick edges are doubly-directed.) 



T 




s 

(d) Final acyclic graph G* . 
U = {xt,x 2 ,x 3 , ti}, V = {x Al x 5 , x 6 , t 2 ,t 3 , t 4 , t 5 }, W = {x 7l t 6 }, X = {x 8 , t 7 } 

Figure 3: Finding the partitions 
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Example 3.13 (continuation of Example I3.10p . One min cut with respect to the flow in 
Figure [3](b) has H = {x\,X2,X3,ti,t} and H its complement; this corresponds to the pairs 
h = {xi,x 2 ,x 3 }, Ji = {h}, h = {x4,x 5 ,x 6 ,x 7 ,x 8 }, and J 2 = {t 2 , t 3 , U, £5, t 6 , t 7 }, with 
squared ratios | = 

Iteratively applying Lemma f3. 121 to the resulting graphs G{I\, J\) and G{l2, J2) can pro- 
duce various replacement subsequences for (Ai,Bi), depending upon the choice of min cuts 
and the number of times the lemma is applied. Picard and Queyranne [3T] give a method to 
find all cuts for this flow problem, thereby allowing us to characterize all ratio subsequences 
associated with (Ai,B{). 

Definition 3.14. Write G*(X) for the result of modifying the residual graph G\ by con- 
tracting all edges contained in directed cycles. 

The directed graph G*(X) is acyclic, is independent of the actual (max) flow /, and has 
nodes corresponding to a partition of the nodes of Ai U Bi U {s, t}. Furthermore, the nodes 
in any partition obtained by iteratively applying Lemma 13.121 must consist of unions of the 
sets corresponding to the nodes of G* . 

Definition 3.15. An upper ideal for G*{X) is any set I of nodes of G*(X) such that v £ I 
whenever u G I and (it, v) is an edge of G*(X). 

A partition (H, H) is therefore a cut if and only if H is an upper ideal. Let X, denote the 
set of upper ideals of G*(X), excluding the trivial ideal {s}. The next corollary follows from 
this discussion. 

Corollary 3.16. The maximum size of any ratio subsequence that can replace (Ai,B{) in ^> 
is equal to the number of vertices in G*(X) \ {s, t}. Moreover, the ratio subsequences 



This simplifies further. A topological ordering of G*{X) is any numbering of the vertices 
so that for every edge (u, v) of G*(X), vertex v is numbered lower than u. Every acyclic 
graph has at least one topological ordering. 

Corollary 3.17. The maximum- cardinality ratio subsequences of Corollarv \3. 1 6\ are in bijec- 
tion with the topological orderings of G*(X). In fact, any ratio subsequence for a particular 
pair (Ai,Bi) corresponds to a partition of the vertices of G*(X) according to one of these 
topological orderings. 




7.2 




are in bisection with nested sequences of sets in X%. 



□ 
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Example 3.18 (continuation of Example 13. 1U] ). Applying Corollary 13.171 to the example in 
Figure[3j the only two acyclic orderings of G* are (U, V, X, W) and (U, V, W, X), which results 
in the two maximal subsequences 



(A,B) 



({x 1 ,x 2 ,x 3 }, {ti}) , ({x 4 , x 5 , x 6 }, {t 2 ,t 3 , t 4 ,t 5 }) , ({x 7 }, {t 6 }) , ({x 8 }, {t 7 }) 
({x 1 ,x 2 ,x 3 }, {ti}) , ({x 4 , x 5 , x 6 }, {t 2 ,t 3 , t 4 ,t 5 }) , ({x 8 }, {t 7 }) , ({x 7 }, {t 6 }) 



respectively, both of which have squared ratios of| = y| = y = y = l. The set of possible 
replacement subsequences for (A, B) corresponds to the twelve distinct contiguous partitions 
that can be formed from one of the above two sequences. 

3.2.4 Valid support sequences 

We next set up the combinatorial structure to give a canonical description of the vistal cell 
V 2 (T,0;A,B;S). 

Definition 3.19. Let (Ai,Bi) be a support pair for the minimal support (A,B). A valid 
support sequence for (Ai, Bi) is comprised of a set of pairs (A[ 1 , B[ 1 ), . . . , (A\ £ , B[ £ ) with the 
following properties. 

(Fl) The sets A\^ and B[ j are nonempty and partition Ai and Bi, respectively 
(F2) The incompatibility graph G(A[ -,B[ ■) is connected for each j = 1, ... ,£. 
(F3) Contracting the sets A\ ■ U B[ ■ in G(Ai, Bi) results in an acyclic graph. 

Example 3.20 (continuation of Examples 13.101 and 13.18]) . Any support derived from the 
maximal supports in Example 13. 181 is a valid support sequence, except for the two supports 

({xi,x 2 ,x 3 }, {ti}) , ({x 4 , x 5 ,x 6 }, {t 2 ,t 3 ,t 4 , t 5 }) , ({x 7 , x 8 }, {t 6 ,t 7 }) 
and ({xi , x 2 ,x 3 , x 4 , x 5 , x 6 }, {h,t 2 ,t 3 ,t 4 , i 5 }) , ({x 7 , x 8 }, {i 6 , t 7 }) , 

whose final pairs do not correspond to connected subgraphs of the compatibility graph. 

Lemma 3.21. Let X 6 T n have associated (X,T)- geodesic with minimal support (A,B) 
satisfying §7ty, and for some index i let (A[ l5 B\ x ), . . . , (A\ e , B[ e ) be a valid support sequence 
for (Ai, Bi). There is an element X' 6 T n with 0(X') = 0(X) for which the geodesic between 
X' and T has support 

A' = Ai, . . . , Ai—i, A'^, . . . , A\p Ai + \, . . . , A/- 
B' = B\, . . . , Bi-\,B' iX , ... , B[p B i+ i, ... ,B k 

with 

Pill < < ||^-i|| < \KJ = _ = |I4,|| < \\Aj+4 < < \\A k \\ 



1-^1 II ll-^i-lll ll^lll WB'ijW ll-^i+lll ll-^fcl 
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Further, for any pair (A£ ^B[ •) and any partition I\ n I2 of A\ ■ and J\ n J 2 of B[ ■ in which 
I2 U Ji is compatible, 

\\h\\ \\h\\ 
\\Ji\\ WHY 

Proof. For support pair (Aj,J?j), let £ and f be the weights on the vertices of G{Ai,Bi). 
Define X' by replacing the (squared) weights on X for each a E A[ ■ by 



be£j(a) 



76 



degj-(6) 



where Ej(a) is the set of vertices b £ B[ ■ such that (a, 6) is in the incompatibility graph, and 
degj(fe) is the number of edges of the incompatibility graph from A\ ■ to 6. These values are 
all well-defined and positive by (Fl) and (F2). Place the following now / on the associated 
flow graph: for edge (a, b) where a G A^- and b € ■ for any 1 < j < I, let the flow on 
that edge be r&/ degj(6); for all other edges, let the flow be 0. Then the flow into node b is 
exactly 77, and the flow out of a is exactly £ a . Corollary 13.161 and property (F3) ensure that / 
is a max flow with respect to the flow graph, with flow value 77, = £ a = 1, and since 
flow is conserved between each A^ • and ■, the original (un- normalized) weights satisfy 

Kill KJ Kll 



11^,1 II Pi, J ll-Sill 

Finally, for a pair (A^ ■, B[^), let ii n/2 and Ji fl J2 be partitions of A\^ and B[^ respectively, 
in which I2 U Ji is compatible. This means that there are no edges of G(A' i j,B[ •) from I2 
to Ji, and since G{A[ ■, B[ ■) is connected there must be at least one edge going from I\ to J2. 
Since flow is positive on all edges of G{A' i ^B^ •), there is a net flow from I\ away from Ji, 

and from the definition of £' it follows that jj-^jy > jmy- I— ' 
3.2.5 Canonical description of vistal cells 

Finally, we extend Propositions l3.3l and l3.4l to describe all vistal cells associated with (X, T)- 
geodesics from points X in an orthant O. Since a valid support sequence is determined by 
the combinatorics of the splits and not by their edge lengths, we can define the following. 

Definition 3.22. A valid support sequence for (0,T) is a support (A,B) for which each 
maximal equality subsequence 



|Aj|| ||Aj-|_i|| \\Aj\ 



\Bi\\ ||-Bi+i|| \\Bj\ 



(13) 



satisfies properties (F1)-(F3) with respect to the pair (U^ =i A& } U^ =i Bg). Write G(0,T) for 
the corresponding incompatibility graph. 
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Theorem 3.23. Fix a tree T G T n . 

1. Vistal cells associated with geodesies to T are exactly those of the form V 2 (T, O; A, B; S), 
where (A,B) is a valid support sequence for (0,T) and S is a signature on (A,B). 

2. The dimension of the vistal cell V 2 (T, O; A, B; S) is dim(O) — m{S), where m(S) is the 
number of "=" components in S. 

3. The representation by a valid support sequence and signature is unique up to reordering 
the support sets within each equality subsequence of S. 

Proof. Claim 1. Let V 2 (T, O; A, B; S) be a vistal cell containing an interior point £. The 
definition of support and the fact that £ is positive implies that (Fl) and (F3) hold for 
G(0,T). Now suppose that (F2) fails to hold; that is, some G(Ai,Bi) has a partition into 
two disjoint subgraphs on vertex sets 1\ U J\ and I2 U J2, respectively. Let / be the max 
flow in G(Ai,Bi). Since (P3) is satisfied, / saturates all arcs adjacent to the source and sink. 
But since flow in each of the disjoint subgraphs G(I\,Ji) and G(l2, J2) is self-contained, 
jpil = P2! = F^T' means that the corresponding tree X satisfies one of its (P3) 

inequalities at equality, so £ cannot be in the interior of V 2 (T, O; A, B; 5), a contradiction. 
Thus (F2) is also satisfied, so (A,B) is a valid support sequence with respect to (0,T). 

Conversely, let (A, B) be a valid support sequence with respect to (O, T). Consider a ratio 
subsequence (fT3j) with all terms equal. Since (A,B) is a valid support sequence, Lemma f3. 21 1 
constructs positive weights X on the edges indexed by Ag, for I = i, . . . , j, so that (fT3l) holds 
and all (P3) inequalities are strict inside each support pair. Now for each maximal-length 
equal-ratio subsequence, scale the vectors of each term by the same positive multiplier Ajj 
so that the sequence of multipliers Ay is increasing with the indices. The scaled a/ vectors 
concatenate into a vector X in the interior of O having the correct signature indicated by S, 
and for which the (P2) inequalities hold strictly between the equal-ratio subsequences. The 
squared point £ corresponding to X therefore lies interior to V 2 (T, O; A, B; S), and the desired 
result follows. 

Claim 2. The vector £ constructed in the proof of Claim 1 is positive in O, satisfies all 
(P3) inequalities strictly, and satisfies all (P2) inequalities strictly for which the corresponding 
component of S is "<". Therefore the dimension of V 2 (T, O; A, B; S) is determined entirely 
by the set of equalities defined by the component of S that are "=" . Since these are linearly 
independent, the dimension is as stated. 

Claim 3. Let F = V 2 (T,0;A,B;S) and F' = V 2 (T,0';A',B'S') be two representations 
of vistal cells, defined by valid supports (A,B) and (A',B') respectively. Any permutation 
of support pairs within an equality subsequence (|13p results in the same set of equalities, so 
if the representations differ only by such a permutation, then F = F' . Conversely, suppose 
F = F' . Since all cell constraint inequalities other than those specified by S are satisfied 
strictly, the set of equalities dictated by S define the affine hulls of F and F' . This means 
that the two associated equality systems are row-equivalent. Now suppose that the supports 
(A, B) and (^4', £>') do not comprise the same sets; that is, by symmetry the two sets Ai 



23 



and Aj both have nonempty intersection with the same set A',. Since the variables of A' k do 
not appear in any other A'^ for I 7^ k, no row transformation of the equality system for F 1 
could possibly separate the variables in Ai n A' k from those in Aj n A',. Thus the two equality 
systems are not the same, a contradiction. □ 

3.3 Vistal subdivisions 

Theorem 13.231 allows us a purely combinatorial way of describing vistal cells. This gives us 
the machinery to prove the principal result of the section, namely that the vistal cells are 
the faces of a polyhedral subdivision of tree space under the squaring map. To make this 
precise, we start with some definitions concerning polyhedra; see |49t Lecture 5] for further 
background. 

Definition 3.24. A polyhedral complex E is a finite collection of polyhedra such that 

(CI) every polyhedral face of every polyhedron in E is a polyhedron in E; 

(C2) the intersection of any pair of polyhedra in E is a face of each. 

The dimension of E is the largest dimension of a polyhedron in E. The facets of E are the 
maximal cells. The underlying set of E is the union IJi/es ^ OI " the polyhedra in E. 

Example 3.25. Tree space T n has a natural polyhedral structure as the underlying space of a 
polyhedral complex whose polyhedra are its orthants. This polyhedral structure is unchanged 
by the squaring map, and thus also found in T 2 . 

The relation between vistal cells and orthants is one of refinement, in the following sense. 

Definition 3.26. Let E and E' be polyhedral complexes. Then E' is a subdivision of E (it is 
also said that E' refines E) if each polyhedron in E' is contained in a single polyhedron in E. 

Theorem 3.27. The vistal cells of tree space for any fixed source tree refine the natural 
polyhedral structure of1~% to form a vistal polyhedral subdivision of!~%. 

Proof. The vistal cells are polyhedra whose union is T% by Propositions 13.31 and 13.41 

For (CI), we show that changing any of the inequalities defining a vistal cell to equality 
results in a set that can be described as a vistal cell. Let V = V 2 (T,0;A,B;S) be a vistal 
cell, so that by Lemma 13.211 (A, B) is a valid support sequence, and let F be a proper face 
of V obtained by setting one of its boundary inequalities to equality. There are three types of 
inequalities that define F: (P2) constraints, nonnegativity constraints, and (P3) constraints. 

For the (P2) constraints, consider the inequality jrg4r < ||^' + ||| , where the corresponding 
component of the signature S is "<" . Let S' be obtained from S by setting this inequality 
to "=" . Since neither O nor (A, B) has changed, this constitutes a valid support sequence, 
and F = V 2 (T,0;A,B;S'). 
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For the nonnegativity constraints, consider the inequality x e > 0, where e is a split 
indexing a coordinate of O. Let Ai be the set containing e. Now remove e from G(0,T). 
This splits G(Ai,Bi) into components corresponding to partitions (A^, B[), . . . , (AL, Bi) of 
(Ai,Bi). Because these partitions correspond to separate components in G(Ai,Bi), they can 
appear in any order in a valid support sequence for F. Thus every point in F must satisfy 
every (P2) inequality between the pairs (A[, B^) at equality, since otherwise the (A'j, B'-) sets 
could be interchanged so that some (P3) condition is violated. First consider the case where 
all of the Aj are nonempty. Define the support (A', B') by inserting (A^, B[), . . . , (A' £ , B' £ ) in 
place of (Ai,B{) in (A, B): 

(A', £>') = (A 1 ,B 1 ), (Ai-i,Bi-i), (A[,B[), (A' e , B' e ), (A i+1 ,B i+1 ), (A k , B k ) 

and extend the signature S to S' by adding "=" signs between each of the sets in the primed 
subsequence. Then (A',B') is valid, and F = V 2 (T,0\ {e}; A', B'; S'). 

Now suppose that one of the support pairs (A'j, B'j) has Aj = 0. The associated ratio must 
be 0, which implies in turn that every ratio corresponding to the pairs (A[, B[), . . . , (A' e , B' e ) 
is 0. Furthermore, the ratios are also for any earlier support pairs. So Xf = for every 
/ € Hi = A\ U • • • U Ai. In this case set 

O' = O \ Hi 
(A',B') = (A i+1 ,B i+1 ),...,(A k ,B k ) 

S' = S restricted to the last k — i pairs of the sequence. 

By Remark II .61 we have been ignoring the non-positive ratios; however, they still exist if there 
are common edges between X and T. In this case, the edges B\ U • • • U Bi become common 
edges, and are added to the 0- valued ratio if it already exists, or form it anew, if it does not. 
Again (A! ,B') is valid, and F = V 2 (T, 0';A',B';S'). 

Next consider the (P3) constraints. For some support pair (Ai, Bi) let I\ U I2 and J\ U J2 
be partitions of Ai and Bi with I2 U J\ compatible, and consider the constraint 

HAH ||/ 2 || 
W\ \\JiW 

Let (Ai, B[), . . . , (A' k , B' k ) and (A' k+l , B' k+1 ), . . . , (A' e , B' £ ) be pairs corresponding to the com- 
ponents of G(Ii, Ji) and G(l2, J2), respectively. 

First consider the case where all of the A'j and B'j are nonempty. The same nonempty sets 
argument as above applies, and we obtain the the face F = V 2 (T,0;A',B';S') with (A',B') 
and S' defined as in the nonempty-set case above. 

Next suppose that one of the sets (A'j, B'j) has A'j = 0. As in the empty-set case above, 
this forces x/ to be for every / G S { = Ai U • • • U A u and so F = V 2 (T, O'; A', B';S') with O', 
(A ' , B') and S' defined as in the empty-set case above. 
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Now suppose that one of the sets {A'-,B'-) has B'- = 0. This forces the ratios for every 
pair in (A^, B[), . . . , (A'^, B' e ) to be oo, which in turn means that xj = for every / G S[ = 
Bi + \ U • • • U Bk- Thus if we define 

6' = o\ s[ 

(A',B') = (A 1 ,B 1 ),...,(A i _ 1 ,B i ^ 1 ) 

S' = S restricted to the first i — 1 pairs of the sequence, 

then again (A', B') is a valid sequence, and so we obtain the face F = V 2 (T, &; A 1 ', £>'; S'). As 
before, the edges Ai U • • • U A^ become common edges, and hence be added to the oo-valued 
ratio if it exists and otherwise form that ratio. 

Finally, suppose that there are pairs {A'-^B'-,) and (A"„ , B'L) with A'-, = B"-„ = 0. 
This forces all of the xj where / is not a common edge to be 0, and we just get the face 
corresponding to the common edges. 

For (C2), suppose that V and V' are vistal cells. Let F and F 1 C V' be the smallest 
faces of V and V containing V D V. By (CI), F and F' are vistal cells. 

Lemma 3.28. Distinct vistal cells have disjoint relative interiors. 

Proof. Let £ be an element in the relative interior of two faces in 7^, given by valid repre- 
sentations. Then £ satisfies (VI) and (V2) with respect to both faces, and by Theorem 13.231 
the only way this could happen is if the faces coincide. □ 

Continuing with the proof of Theorem 13.271 let p £ V C\V be in the relative interior of F 
and p' £ V DV' he in the relative interior of F' . Any point in the relative interior of the line 
segment joining p to p' lies relative interior to both F and F' , so Lemma 13.281 implies that 
F = F'. Since V and V are vistal faces to begin with, each has F = F 1 as a face, whence 
F = F' C V n V, and therefore F = F' = V n V as desired, because VflVCF. □ 

3.4 Examples of vistal complexes 

Example 3.29. To demonstrate Theorem 13.271 consider the incompatibility graph from 
Figure [3] and treat it as the incompatibility graph for two trees T and X. Take values on T 
as given in the figure, and consider the vistal cell V = V 2 (T, 0;A,B;S) defined by 

O = {x 1 ,x 2 ,x 3 , x 4 , x 5 , x 6 , x 7 , x$} 
(A,B) = ({x 1 ,x 2 ,x 3 },{t 1 }), ({x i ,x 5 ,x 6 ,x 7 ,x 8 },{t2,t 3 ,h,t 5 ,t 6 ,t 7 }) 
S = (<). 

This is a valid sequence, and in particular, using Lemma f3.21l we can assign weights as follows. 
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(The first three weights have additionally been scaled so that (P2) is satisfied strictly.) Here 
are examples of the three types of faces of V. 

• Setting the single (P2) constraint to equality: this gives the face corresponding to the 
numbers in Figure 

• Setting xj = 0: for j 7^ 5, 6 the face has the same structure as the cell V, except that 
Xj is removed from the corresponding sets. For j = 5,6, removal of Xj disconnects 
(A2, B2) by isolating £4 or {£3, £5}, respectively, and thus setting X5 or xq to collapses 
the face to the single origin point. 

• Setting the (P3) constraint with I\ = {x4,,x^,x%}, J\ = {£2, £3, £4, £5}, h = {%7,xs}, 
and J2 = {£6, £7} to equality: here 

\\hf _ 79 5 _ ||/ 2 || 2 
|| Ji|| 2 ~ 66 > 12 ~ ||J 2 || 2 ' 

Now G(l2, J2) is not connected, and has nontrivial components on vertex sets {x-j^to} 
and {x8,£7}. Thus the face obtained by setting the above inequality to equality is 
V 2 {T,0;A',B';S'), where 

(A',B') = ({xi,x 2 ,x 3 },{£i}), ({x 4 ,x 5 ,x 6 },{£ 2 ,£3,£4,£5}), ({£7}, {£e}), ({^s},{£7}) 
S> = «,=,=). 

Example 3.30. FigureHlgives the restriction of a vistal polyhedral subdivision to a maximal 
orthant in T5. The trees are depicted in Figure 4(a) , with £1 = £2 = £3 = 1- Figure [4(b)| depicts 
the cells in orthant 0({xi, X2, 2^3}) as they intersect with the set £1 + £2 + £3 = 1. The vistal 
cells are labeled with the appropriate ratio sequence, separated by "=" or "<" as indicated 
by the behavior of points in the interior of the vistal cell. We also label the six cells of lower 
dimension that are the intersections of the vistal facets. 



3.5 Multivistal complexes 

It is a straightforward matter to extend vistal cells to the case where there is a collection 
T = {T 1 , . . . , T r } of source trees in T n , and we are interested in the set of points X E T n for 
which the geodesic to each tree in T has a specified combinatorial structure. 

Definition 3.31. A premultivistal cell for a collection T of trees is a set of the form 

r 

V(T;0;A T ,B T ) = f]V(T e ,0;A e ,B e ), 
1=1 

where V(T , O; A , B ) are previstal cells, O C T n is an orthant, and 

(A T ,B T ) = {(A\B 1 )...,(A r ,B r )} 

is a collection of support pairs for (T l , X)-geodesics. A multivistal cell is the image in of 
a premultivistal cell. 
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(a) Trees T and X. 

X2 




WH „ W „ lite) 
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(b) A cross-section of the orthant corresponding to tree topology X. 
Figure 4: The vistal polyhedral subdivision between fixed tree T and variable tree X in 7i. 

Corollary 3.32. T/ie multivistal cells of tree space T n for any fixed set source trees refine 
the natural polyhedral structure ofT n to form a multivistal polyhedral subdivision ofT n . 

Proof. The common refinement of any finite collection of polyhedral subdivisions of a given 
polyhedral complex is a polyhedral subdivision of the same polyhedral complex, and so the 
result follows from Theorem 13,271 □ 

Remark 3.33. The previstal cells for any fixed source tree form a subdivision of T n , called 
a premultivistal complex, that is the image of the corresponding multivistal polyhedral sub- 
division of under the inverse £ — > y/£ of the squaring map, which is a homeomorphism. 
However, the cells in this subdivision are not polyhedral. One might hope that the premulti- 
vistal complex is a CW complex, in the standard topological sense (see [35], for example), but 
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it is not, for the same reason that multivistal polyhedral subdivisions are not CW complexes: 
the closed cells are not images of closed balls under continuous maps (a cone of positive 
dimension fails to be compact). The situation can be remedied by considering the link L n 
of the origin in 7^, namely the set of trees whose edge lengths sum to 1. Intersecting L n 
with any multivistal polyhedral subdivision yields a polyhedral CW complex whose preimage 
under the squaring map is a (non-polyhedral) C W complex. Thus a premultivistal complex 
is essentially a (noncompact, unbounded) cone over a CW complex. 

Remark 3.34. In general, the number of vistal facets is exponential in n, even within a 
single orthant [38]. Thus an efficient method to move through the vistal facets - or prune 
the list of relevant ones - would likely improve calculation time of the mean. 

4 Computing the mean in tree space 

Although the algorithm to calculate the mean in Section [2.21 is simple and seems to perform 
well for small data sets, Remark 12.61 indicates that its convergence rate is sublinear, so 
in theory it is a poor iterative method. This section outlines a general framework for a 
descent method to find the mean of a set T , . . . ,T r of n-trees. It promises to accelerate 
the convergence considerably by generalizing powerful nonlinear programming techniques to 
apply to optimization in tree space. We are currently in the process of implementing it. 

4.1 Optimality criteria 

We start by analyzing the variance function S(x) of a variable point X 6 T n whose com- 
ponents are represented by the variable vector x. For t = 1, ... , r, let 7^ be the geodesic 
from X to Ti, with associated support pair (A e ,B e ). By summing the lengths L(ji) of these 
geodesies as given by Eq. ([6]), write the variance in T n as 



with its derivative given by Eq. (fTUj) . Consider this function in its 7^-version S 2 {£) as given 
in Definition 13.21 Using the notation 



r r |~ k( 



1=1 1=1 Li=l 




and 




+1 if A\ and B\ are disjoint 

— 1 if A\ = B\ are made up of common edges, 
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then the corresponding pullback function for £ £ 7^ can be derived from Eqs. ([5]) and 

r k e 



s 2 (e) = EE(^+ii^n) 2 - ( 14 ) 



£=1 i=l 



If i(e,£) denotes the index of the set A\ containing e, then the gradient of S 2 can be obtained 
from Eq. (TH]) : 



s i(e,^) / 

The differentiability of S transfers to S 2 , as well. 

Corollary 4.1. T/ie function S 2 (£,) is continuously differentiable on the interior of every 
maximal orthant O. 

Proof. The inverse of the squaring map is continuously differentiable on the interior of O. 
Now apply Theorem 12.21 □ 

The function S 2 (£) is not necessarily convex on 7^. By Proposition 12.11 however, it 
does have a unique local minimum, which is therefore the mean. Consequently, optimality 
conditions for the function S 2 {^) on T% can be based on its behavior in any one of the 
multivistal facets in which £ lies. In particular, we have the following important result. 

Corollary 4.2. The squared image X of the Frechet mean X must satisfy V ' S 2 (X) = on 
its orthant 0{X). If X lies interior to a maximal orthant O, then X is the squared image 
of the mean if and only if the gradient satisfies VS 2 (X) = 0. These statements are true 
regardless on which multivistal facet of O the variance function is derived. 

Proof. Since by Corollary 14.11 the gradient is independent of which vistal facet it is calculated 
from, the gradient VS 2 (X) must be zero on any of them in order X to be optimal. Conversely, 
since S 2 attains a unique minimum on 7^, it follows that X must be the mean whenever 
X7S 2 (X) = on an entire maximal orthant. □ 

Remark 4.3. When a point X lies on the boundary of a maximal orthant, the gradient 
VS 2 (X) may be zero on 0(X) even if X is not the squared mean, since there may be 
a maximal orthant O D O(X) having a point with smaller variance than X. Finding O 
from X can be quite difficult, since of dS 2 (X)/^ e may be undefined or infinite for e ^ O(X). 
Furthermore, directional derivatives may fail to be continuous along orthant boundaries. This 
issue presents serious optimization difficulties in locating sample means, since there is ample 
evidence that reasonably evenly distributed data in tree space yield means that are likely 
to occur on orthant boundaries, or indeed, even to lie at the origin; see Section 15.31 Thus 
optimality conditions for the mean when it occurs on orthant boundaries is an important 
topic of further research. 
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4.2 A descent method to compute the mean 



In spite of Remark 14.31 we can suggest a basic method for finding the mean in tree space. 
The general idea is to start with some feasible tree, and construct a sequence of trees whose 
variance function is decreasing, until arriving at the mean tree. 

Algorithm 4.4 (Descent method for computing the mean). 
input Trees T 1 , . . . , T r in T n 
output The mean tree for T 1 , ... , T r 
initialize Choose some good starting point £° 6 T 2 , for example, by running Sturm's 
algorithm for a predetermined number of iterations. 
WHILE the mean has not been found: 

DO 1. Find the set M of all maximal orthants containing £*. 

2. For each V 6 M, choose a point u v in the interior of V. 

3. Use a nonlinear interior point/penalty function method to find a local min- 
imum Uy of S 2 in V. 

4. If Uy 7^ for any V £ Ai, then choose the tiy with minimum S 2 (iiy), and 
set l t+1 = uiy. 

END WHILE-DO 
RETURN 

The local minimum search in Step 3 should be both straightforward and reasonably fast, 
and the accuracy of the points £ fc as representing the true local minimum of course depends 
upon the method used to find it. Since the function S 2 is continuously differentiable on all 
V £ M, the search in fact finds a local minimum on the orthant V. Since all neighboring 
orthants are searched from £ , it follows that whenever all of these local searches converge 
back to then must necessarily be the mean. Finally, the algorithm terminates after a 
finite number of iterations, since no two in the sequence can lie in the same orthant. The 
number of iterations depends both on the number of iterations t and also the size of A4 , each 
of which may be exponentially large. Thus it is important for the implementation that a good 
starting point £° be found, and that a good method be used to determine descent directions 
in the set of maximal orthants adjacent to the point £*. We leave these implementational 
issues to a future paper. 

In general, better local search techniques and starting solutions, perhaps through a hybrid 
of Sturm's Algorithm and descent methods, could improve the accuracy and reliability of 
procedures to calculate the mean. This is an area of current research. 



5 Properties and applications of the mean 

This section contains a series of remarks, results, and computational studies related to the 
Frechet mean in tree space. Several examples in this section use the trees given in Figure [5j 
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The three orthants in Ti there are adjacent, with the shaded (e' l5 e2)-orthant missing, since 
and e 2 are not compatible. The orthants have been drawn "flattened out" in the same plane 
to make the geodesies and means easier to see, although T4 is not globally embeddable in the 
plane. For a tree T 1 (respectively, trees T 2 and T3), we specify its interior edge lengths by 
a pair of coordinates (ei,e2) (respectively, (ei,e' 2 ) and (e'i,e' 2 )). The geodesic between any 
pair of these trees is a straight line whenever it does not cross into the shaded region, and 
otherwise it is the cone path consisting of the two legs joining the given points to the origin. 
Likewise, T is the Euclidean barycenter whenever that point does not lie in the shaded region, 
and otherwise it is the point on the boundary of the shaded region that minimizes the sum 
of the squared geodesic distances to the three trees. 





Figure 5: Example for the remarks. 



5.1 Composition of the mean tree 

The topology of the mean tree depends on both the topologies and the edges lengths of the 

sample trees. Consider, for example, trees T 1 = (3, 1) and T 3 = (1,3). The mean between 

2 

these two trees is the midpoint T = (1, 1) of the segment joining them. Changing both edge 
lengths of T 1 to 5, however, yields a midpoint T = (1,2); similarly, by symmetry, changing 
both edge lengths of T 3 to 5 yield the midpoint T = (1,2). That said, in general we can 
give some indication of what edges lie in the mean tree. 

Lemma 5.1. Every edge of the mean tree is an edge of some sample tree. Furthermore, if 
an edge appears in all sample trees, then it must also appear in the mean tree. 

Proof. If a tree contains an edge not in any sample tree, then contracting this edge gives 
a tree with a smaller variance function. Now suppose that the edge e is contained in all 
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sample trees, and thus is compatible with all other edges in the sample trees. Since the mean 
contains only edges from the sample trees, edge e is also compatible with all edges in the 
mean tree. If T is any tree not containing e but whose edges are all compatible with the 
mean tree, then adding e to T with length equal to the minimum of its lengths in the source 
trees yields a tree that is closer to all of the sample trees, so T is not the mean. □ 

5.2 Other notions of consensus tree 

Several authors have proposed notions of "center" for a set T = {T 1 , . . . , T r } of points in T n - 
The Euclidean or combinatorial properties of these centers make them useful for representing 
consensus trees. All of these centers agree when T lies entirely in a single orthant of 7~ n , but 
fail to agree for more globally distributed samples from tree space. We investigate three of 
these notions of center here, in terms of their relationship to the Frechet mean T of T. 

Example 5.2 (The majority-rule consensus (MRC) tree). First introduced by Mar- 
gush and McMorris [33], this is the tree whose edge set is comprised of those edges that 
appear in at least half of the trees in T. According to Bryant's classification scheme of tree 
consensus methods [13], many consensus methods are refinements of MRC; that is, the re- 
sulting consensus tree contains the MRC tree as a subset of its set of edges. In its basic 
form, MRC does not take into account edge weights, and it is interesting to note that the 
topology of T need not refine that of the MRC tree, as indicated by the trees in Figure [5] 
with coordinates T 1 = (1,1), T 2 = (1,1), and T 3 = (5,6). The mean of these trees is the 

3 

Euclidean centroid T = (1,2). The MRC tree, on the other hand, has the topology of tree 
T 2 , and so neither tree is a refinement of the other. In contrast, the unanimous consensus 
tree — having as its edges those that appear in all sample trees — has T as a refinement, by 
Remark 15.11 It is typically not an interesting tree, as it has very few edges; e.g., the one for 
this example has no internal edges at all. 

Example 5.3 (Sturm's inductive mean). The inductive mean (Definition 12 .3|) of the 

set T, for some ordering of T, does not coincide with T, and it can differ depending upon 
the ordering. Consider the trees in Figure [5] with coordinates T 1 = (3, 10), T 2 = (3,3), and 
T 3 = (10,3). Either order having T 1 and T 3 first yields the inductive mean T 2 = (1,1). 
Either order having T 1 and T 2 first yields the inductive mean f 3 = (0.390,0.117), and either 
order having T 2 and T 3 first yields the inductive mean T 1 = (0.117,0.390). These have 
different topologies, and none of them equals T, which has all edges 0. 

Example 5.4 (The BHV centroid). Billera, Holmes, and Vogtmann [12] define the cen- 
troid of T = {T 1 , . . . ,T r } inductively on r. For r = 2, the centroid is the midpoint of the 
two trees. For r > 2, the centroid is obtained as follows: set T 1 = T and inductively find the 
centroid of each subset of r — 1 trees in T 1 to obtain a new set T 2 of r trees. Repeat this 
process on the new set, creating a sequence T^T 2 , ... of r-sets of trees. The BHV centroid 
of T is the limit T of any sequence of points chosen from each of the sets T\ This process 
converges in a general global NPC space [121 Theorem 4.1]. 
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Billera, Holmes, and Vogtmann note that in Euclidean space, the centroid and Frechet 
mean coincide. This is not the case in tree space. Consider the trees in Figure [5] with 
coordinates T 1 = (2,4), T 2 = (2,2), and T 3 = (4,2). Then T is again the origin. The 
BHV centroid is obtained by first taking the midpoints of the pairwise geodesies for these 
three points. This forms set T' = {f 1 = (2, 1),T 2 = (0,0), f 3 = (2, 1)}, with the associated 
triangle A lying entirely within the union of the three orthants containing T , T 2 , and T 3 . 
But this means that the BHV centroid subsequently converges to the Euclidean barycenter 
of A, which is T 2 = (1/3, 1/3) ^ T. Thus the BHV centroid is not equal to the Frechet mean. 

5.3 Stickiness of the mean 

Sullivant [48] noticed the tendency of the Frechet mean to be sticky, which in this context 
means that perturbing one or more of the trees in the set T does not necessarily result in any 
of the coordinates of T changing. Take, for example, the points T 1 = (3, 10), T 2 = (3, 3), and 
T 3 = (10, 3). The mean T lies at the origin, and remains there even if the coordinates of any 
of the three trees T l are perturbed even up to a full unit. Sticky means occur exclusively on 
orthants of lower dimension, underscoring the importance of closely investigating properties 
of mean trees that lie on orthant boundaries. 

The notion of stickiness has been quantified via a Central Limit Theorem for means of 
probability distributions on certain NPC spaces [101 129] . 

5.4 Application to data sets in phylogenetics and physiology 

Statistical applications of this research are important in several areas of mathematics, biology, 
and medicine. Here, we consider two well-studied data sets, one in phylogenetics and one in 
physiology, with respect to the Frechet mean. 

Example 5.5 (Gene trees vs. species trees). A gene tree is a phylogenetic tree rep- 
resenting the evolutionary history of a particular gene from some set of species, calculated 
using molecular genetic methods, for example. In contrast, a species tree is a phylogenetic 
tree representing the evolutionary history of the set of species: the history of population bi- 
furcations leading to divergence. Due to natural processes such as incomplete lineage sorting, 
gene trees for different genes can have different topologies, even when sampled from the same 
set of individuals — let alone the same set of species — and hence a gene tree need not share its 
topology with the species tree (see [32], for example). Furthermore, the most likely gene tree 
topology need not agree with the species tree topology pOf. However, species trees are usually 
reconstructed from gene trees, and a major open question is how best to accomplish this. 

We examined the yeast data set of Rokas et al. [IS]. For eight species of yeast, they 
identified 106 genes and reconstructed the corresponding gene tree with edge lengths for 
each using a maximum likelihood approach. Of these 106 gene trees, there were 21 different 
topologies. We used Sturm's algorithm to compute the mean of these gene trees. This mean 
tree had the same topology as the agreed-upon species tree [22]. In general, there may be a 



34 



connection between the topology of the mean tree of a set of gene trees and the topology of 
the corresponding species tree. However, as a consequence of stickiness, when branch lengths 
are taken into account, the mean gene tree does not necessarily identify species tree |48j ; that 
is, two finite samples of gene trees can yield the same mean tree but have different species 
trees. However, we conjecture that the topology of the species tree is a refinement of the 
topology of the Frechet mean of the gene trees. That is, stickiness of the Frechet mean forces 
some edges to have zero length but should not add any extraneous edges to this mean. 

Example 5.6 (Applications to brain artery structure). Analyzing the shape and struc- 
ture of arteries in the brain can help predict the progression of some diseases |15} [T6] . These 
arteries have a tree structure, which can be extracted from Magnetic Resonance Angiography 
(MRA) images, following the procedure in [7]. Our dataset of artery trees is a subset of 109 
brain scans from healthy human subjects of a wide range of ages that is available from CASI- 
LAB at |http : / /hdl . handle . net/ 19 26/594, The extracted artery trees have been further 
cleaned up as described in [6]. As these trees were unlabelled, 185 landmark locations on 
the brain were identified using methods from [37] and attached to the trees as leaves. This 
gave a dataset of 85 labelled trees in which the edge lengths correspond to the lengths of the 
artery segments, with the lengths of the leaf edges again ignored. Variation of the unlabelled 
dataset have previously been analyzed under some different statistical models (e.g. [T4"l [50]). 

We ran our implementation of Sturm's algorithm on this data set, with A = 5, e = 10 -5 , 
and K = 1000 000. While Sturm's algorithm did not converge within these parameters, the 
final mean tree approximation was within 10 -6 of the star tree (the tree with all internal 
edges having zero length and all leaf edges having length equal to the average of that edge in 
all trees in this dataset), but had higher variance than the star tree. We thus conclude the 
mean is the star tree for this artery set. However, we see by other measures that the artery 
trees are not randomly distributed. In particular, we can create a new dataset by either 
independently permuting the leaves of each tree in our original dataset (called a permuted 
dataset) or by sampling with replacement from our dataset (called a resampled dataset). We 
generated 100 permuted and 100 resampled datasets, computed the variance of both the star 
tree and the approximate mean tree for each dataset, and selected the lowest one for each 
dataset. Figure [6] is a box plot of these variances, depicting the sample minimum, lower 
quartile, median, upper quartile, sample maximum, and outliers for both datasets. From 
this plot, we see that the two distributions are not equal, which indicates that the artery 
trees are not randomly distributed in tree space, controlling for average branch length. We 
believe instead that the mean being the star tree is an example of stickiness, at least partially 
caused by differences at the root of the artery trees. The root of the artery trees represents 
the Circle of Willis, a circle of arteries at the base of the brain, from which all other brain 
arteries extend. There is significant natural variation in which arteries extending up from the 
Circle of Willis supply which sections of the brain [12] . This variation translates into between 
3 and 5 branches leaving the root in our artery trees. With such a fundamental difference in 
artery tree topology, it is not unexpected that the mean is the star tree. 
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Variance of permuted -leaf sets vs. resampled sets 



o h 1 I f - h 

4 

i i i i 

K co o 

id o5 o5 

Variance 

Figure 6: Distribution of the variance of datasets generated from the original artery trees in 
two different ways, namely by resampling with replacement and by independently permuting 
the leaves of the original trees. 

6 Globally nonpositively curved spaces 

Virtually all of our treatment of tree spaces extends to more general global NPC spaces. This 
section reframes the concepts and notation of the paper in the context of global NPC spaces, 
particularly orthant spaces, and shows how the results of the paper generalize to these spaces. 

6.1 The geometry of nonpositively curved spaces 

Fix a metric space T = (T, d). A path in T is the image of a continuous map 7 : [0, 1] — > T '■ 
Write 7a = 7(A) for < A < 1. The length of 7 is the supremum of all sums 

d(7x , 7xi ) + d(7xi , 7* a ) H + d (7x fe _! , lx k ) 

such that < Xq < • • • < Xj. < 1. A path is a (, global) geodesic if the distance d{pf x ,ly) between 
any pair of points on 7 equals the length of that portion of 7 between them. A geodesic space 
is a complete metric space such that every pair {x, y} of points is joined by a path 7 whose 
length is the distance d(x, y) between x and y. 

Definition 6.1. A metric space (T,d) is globally nonpositively curved, also known as global 
NPC or CAT(O), if for every triple of points a,b,c E T, any point x on a geodesic joining 
a to b, and any reference triangle a'b'c' in Euclidean space with edge lengths d(a,b), d(b,c), 
and d(a, c), the unique point x' on a'b' at distance d(a, x) from a' satisfies d(x, c) < | \x' — c'\ \ . 

The definition essentially says that triangles created by joining points by geodesies in a 
global NPC space are "skinnier" than their counterparts in Euclidean space. 

Lemma 6.2 ([171 Proposition 2.3]). In a global NPC space every pair of points is joined by 
a unique geodesic. □ 



re sampled — 
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A real- valued function / : T — > K is convex if f o 7 is convex for all geodesies 7; that is, if 



for all geodesies 7 : [0, 1] — > T. 

Example 6.3. For any point t £ T, the distance dt(x) = d(x,t) from a point x £ 7" to i is 
a convex function of x |47l Corollary 2.5 and subsequent Remark (i)]. 

A real- valued function / : T — > M is strictly convex if Eq. (|15p holds strictly for < A < 1 . 

Lemma 6.4 ( [47[ Proposition 1.7 and Remark 1.8]). Any strictly convex continuous function 
on a global NPC space attains a unique minimum. □ 

Corollary 6.5. If T = {t , . . . ,t r } is a set of points in 7~, and f : W + 1— > K is any (strictly) 
convex function, then the function F : T *— > M. defined by 



is a (strictly) convex function. 

In particular, the variance function for a set T of points is a convex function and hence 
attains a unique minimum at the Frechet mean. 

6.2 Means and variances in global NPC spaces 

This subsection generalizes the notion of mean and variance to general probability measures 
in global NPC spaces. The results follow from those of Sturm [37] in this area. Let 'P(T) be 
the set of probability measures on a global NPC space T ■ If p £ V(T) is such a measure, 
then its variance is 



The variance can be infinite in general, but not in the case of most interest to us, when p 
has finite support, meaning that there is a set T = {t , . . . ,t r } of points in T, along with 
nonnegative weights u±, . . . , w r satisfying oj\ + ■ ■ ■ + oj r = 1, such that the point ti has mass 
p(t{) = W{ for i = 1, . . . ,r. Let V 2 (T) be the set of measures in V(T) having finite variance. 

Proposition 6.6 ([33 Proposition 4.3]). For a global NPC space T and probability measure 
p £ V 2 (T), there is a unique point p £ T such that var(p) = Jj- d 2 (p,y)p(dy). 

The point p is referred to as the Frechet mean or barycenter in this context as well, and 
when p has finite support with Ui = - for all r, it is a direct generalization of the definition 
of mean given in Section [2j The notion of inductive mean given by Definition 12.31 extends 
easily to an arbitrary global NPC space T, and the following result generalizes Theorem 12.41 



/(7a)<(1-A)/( 7o ) + A/( 7 i) 



(15) 



F(x) = f(d t i(x),...,dtr(x)) 
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Theorem 6.7 ( |47| Theorem 4.7]). For a global NPC space T and probability measure 
p € V 2 (T), let X l ,X 2 ,... be a sequence of independent and identically distributed ran- 
dom variables drawn from p. Then with probability 1, the sequence of inductive mean values 
pi, P2, ■ ■ ■ approaches the mean ~p of p; that is, 



- £ x 



e , -r. 



Corollary 6.8. The convergence properties of the sequence of inductive means given by 
Algorithm \2.5\ continue to hold on any probability distribution p 6 V 2 (T), by sampling the 
points of T according to the specified distribution. 

Remark 6.9. As an application of Corollary 16.81 Markov chain Monte Carlo (MCMC) 
simulations produce phylogenetic trees sampled independently from a fixed finite-variance 
distribution on the entire space T n . Calculating inductive mean values of repeated samples 
from this distribution results in a method to approximate the mean of the distribution. 



6.3 NPC orthant spaces 

Definition 6.10. The orthant space 0(£,£l) consists of a set £ of axes together with a 
simplicial complex !1 C 2 £ , called the scaffold complex. Two elements of £ are compatible if 
they appear in some face of Q. Each face F £ $7 is associated with a copy Of of IR+, the 
orthant associated with F. The orthant space 0(£,Q) is the union of the orthants Of for 
F € f2, with points identified whenever their nonzero coordinates agree on all elements of £. 

An orthant space can be thought of as constructed by gluing together orthants according 
to instructions laid out by the scaffold complex, and in fact the scaffold complex is (homeo- 
morphic to) the link of the origin in the orthant space 0(£, 0). 

Example 6.11. Tree space Tn is an orthant space: £ corresponds to the set of splits on 
{0, . . . , , n}, and £1 corresponds to the collection of sets of splits that are compatible in the 
sense of Section 11.11 

A path in an orthant space 0(£, ft) is defined as in Section 16.11 A locally length- 
minimizing path is a geodesies, which always consists of a finite number of linear legs through 
intermediate orthants of T, as in the case of tree space (Section II. 2p . As with tree space, 
0(£,£l) is always path-connected. 

Although any orthant space is geodesic, it may not be global NPC. 

Example 6.12. The space T = 0(£,Q), where £ is indexed by {1,2,3} and the scaf- 
fold complex has facets {1,2}, {1,3}, and {2,3} is not global NPC. Indeed, the two 
points x = (1,0,0) and y = (0,1,1) in T have a pair of geodesies between them, namely 
[(1,0, 0), (0, 1/2, 0)] U [(0, 1/2, 0), (0, 1, 1)] and [(1,0, 0), (0, 0, 1/2)] U [(0, 0, 1/2), (0, 1, 1)]. By 
Lemma 16.21 T cannot be global NPC. 
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M. Gromov [21] determined conditions on f2 that characterize when 0(£,Q) has non- 
postive curvature (in fact, Gromov worked with arbitrary cubical complexes), based on the 
following standard notion from geometric combinatorics. 

Definition 6.13. The simplicial complex $7 is flag if F £ 0, whenever all pairs of elements 
in F are compatible. 

Proposition 6.14 (|24j). An orthant space 0{£,Q,) is global NPC if and only if Q is flag. 

In particular, tree space T n is global NPC, since its scaffold complex is defined precisely 
by the pairwise compatibility between its splits (this is the proof given in [E]). Generally, 
any global NPC orthant space can be defined entirely by its set of compatible elements. 

Definition 6.15. The scaffold graph Q{£,Q) of an orthant space 0(£,Q) is the graph with 
vertex set £ whose edges are the pairs of compatible elements of £. 

Lemma 6.16. The orthants of a global NPC orthant space 0{£,Q) are precisely the clique 
sets (sets of mutually compatible edges) of the scaffold graph Q(£,Q). □ 

Thus there is a one-to-one correspondence between orthant spaces and graphs. A general 
global NPC orthant space 0(£, O) need not have all of its maximal orthants the same dimen- 
sion, since maximal orthants correspond to the maximal cliques in Q{£ , Q). The dimension of 
the maximal orthants, however, is not relevant to any of the results in the previous sections, 
except when the dimension is given explicitly. 

Example 6.17. The space of trees in which each split is associated with an m-dimensional 
vector instead of a single length is an NPC orthant space. In this case, the scaffold graph is 
the scaffold graph of tree space 7~ n , with each vertex replaced by K m , the complete graph on 
m vertices. Our software implementation also computes geodesies and means in this space. 

Example 6.18. The generality of scaffold graphs to define any global NPC orthant space 
provides an opportunity to extend the statistical structures of this paper to a wider range 
of applications. As one example, consider a computer network specified by its computa- 
tional devices and the graph Q denoting those pairs of computers that are compatible with 
each other. A local area network (LAN) for this system is a set C of mutually compatible 
computers — that is, a clique of Q. A local network configuration (LNC) is a LAN C together 
with a measure w e of participation of each computer e G C in the LAN C. Some important 
areas of analysis of the network Q might be the relationship between the LNCs associated 
with Q, in terms of the number and participation weight of common computers and the rela- 
tive compatibility of the noncommon computers (although it does not model chaining-related 
measures such as the number of nodes in a communications path). The global NPC orthant 
space generated by Q would be a good framework for answering questions like this associated 
with the LANs of the network. 
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The combinatorics of geodesies in Section[T]generalizes immediately to global NPC orthant 
spaces, using the generalized notation in this section. None of the proofs in Sections [THU rely 
on particulars of tree space except the flag property. Thus we have the following. 

Corollary 6.19. The results in Sections Gr0 (except for statements specifying dimension) 
extend to arbitrary global NPC orthant spaces, using the definitions in this section. In par- 
ticular, the GTP algorithm 14$ for finding geodesies in tree space, Sturm's Algorithm (Al- 
gorithm \2. 5|) . and the Descent Method (Algorithm apply in the more general setting of 
global NPC orthant spaces. 

Remark 6.20. The results here extend even further. For example, Ardila, Owen, and 
Sullivant [4] extend the global NPC theory, and in particular the GTP algorithm, to the case 
of cubical complexes, where orthants are replaced by Euclidean cubes. Sturm's algorithm 
extends to these global NPC cubical complexes, and there is every reason to believe that the 
idea of vistal cells and the Descent Method can be extended as well, although without the 
polyhedrality. Note that Corollary 16.81 was independently observed by Bacak [8]. 
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